Portfolio-Exam Task 2 in MADS-DVVA (Data Visualization and Visual Analytics) - Story

  • Date: 2024/30/06, SoSe 2024

  • Name: Tom Ruge

  • Student ID: 944530

Introduction

In this jupyter notebook I will investigate the music taste of 2 of my friends. The data retrieval was done in this jupyter notebook. I used the Spotify API to query the playlists and collected multiple interesting features. The goal is to compare the 2 playlists in order to find similiaritys and differences between their music tastes regarding the playlists. Therefore it is not a comparission between their music taste but their 2 playlists.

The Data

Additional Data:

Many attributes were retrieved like:

  • Songtitles
  • Genres
  • Artists
  • Song duration
  • Happiness of the song
  • And many more ...

Preparation

Import Packages:

In [28]:
import pandas as pd
import ast
import numpy as np
import matplotlib.pyplot as plt
from matplotlib_venn import venn2, venn2_circles
from collections import Counter
from typing import Tuple, List
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
pd.set_option('display.max_columns', None) 
import plotly
plotly.offline.init_notebook_mode()

Read the combined and cleaned Data:

In [29]:
def read_data() -> Tuple[pd.DataFrame, pd.DataFrame]:
    """
    Reads data from CSV files and returns two DataFrames.

    The function reads 'marco_combined.csv' and 'lars_combined.csv' files
    from the 'data/' directory and returns them as pandas DataFrames.

    Returns:
        Tuple[pd.DataFrame, pd.DataFrame]: A tuple containing two DataFrames,
                                           the first for Marco and the second for Lars.
    """
    path = 'data/'
    df_marco = pd.read_csv(path + 'marco_combined.csv')
    df_lars = pd.read_csv(path + 'lars_combined.csv')
    return df_marco, df_lars

# Read the data
df_marco, df_lars = read_data()

Data Preprocessing:

Each song can contain multiple artists and multiple genres. In order to extract those we need to parse them we will flatten the arrays to 1 dimension in order to cout the genres and artists for the following investigations:

In [30]:
def flatten_attr(df: pd.DataFrame, attr: str = 'artist_genres') -> List[str]:
    """
    Flattens a specified attribute in a DataFrame by extracting elements from lists stored as strings.

    This function iterates over each row in the DataFrame, evaluates the specified attribute
    as a Python literal, and extends a list with elements from these lists.

    Args:
        df (pd.DataFrame): The DataFrame containing the attribute to flatten.
        attr (str): The attribute (column) in the DataFrame to flatten. Defaults to 'genres'.

    Returns:
        List[str]: A 1d list containing all elements from the specified attribute lists in the DataFrame.
    """
    genres = []
    for i, row in df.iterrows():
        try:
            # Safely evaluate the string as a Python literal
            genre_list = ast.literal_eval(row[attr])
            # Ensure the evaluated value is a list
            if isinstance(genre_list, list):
                genres.extend(genre_list)
        except (ValueError, SyntaxError) as e:
            print(f"Skipping row {i} with value {row[attr]}: {e}")
    return genres

Combining both Marco's and Lars's playlist to one:

In [31]:
def combine_dfs(df_lars: pd.DataFrame, df_marco: pd.DataFrame) -> pd.DataFrame:
    """
    Combines two DataFrames by adding an 'owner' column to each and concatenating them.

    Args:
        df_lars (pd.DataFrame): DataFrame containing Lars's data.
        df_marco (pd.DataFrame): DataFrame containing Marco's data.

    Returns:
        pd.DataFrame: A combined DataFrame with an 'owner' column indicating the source of each row.
    """
    df_lars['owner'] = 'Lars'
    df_marco['owner'] = 'Marco'
    df = pd.concat([df_lars, df_marco], axis=0)
    return df

# Assuming df_lars and df_marco are already defined and loaded
df = combine_dfs(df_lars, df_marco)

Transforming the duration of the songs from ms to min:

In [32]:
def to_min(df: pd.DataFrame) -> pd.DataFrame:
    """
    Converts the 'duration_ms' column in a DataFrame to 'duration_min' in minutes.

    Args:
        df (pd.DataFrame): DataFrame containing a 'duration_ms' column.

    Returns:
        pd.DataFrame: The DataFrame with an additional 'duration_min' column in minutes.
    """
    df['duration_min'] = df['duration_ms'] / 60000
    return df

df = to_min(df)

Graphical Represenations

Figure 1 - Top 10 Most Listened Genres

Since we are interested In the first plot we will compare the most listened genres of the 2 playlists with each other.

In [33]:
df_marco.columns
Out[33]:
Index(['track_name', 'added_at', 'release_date', 'release_date_precision',
       'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
       'type', 'duration_ms', 'time_signature', 'uri', 'artist_names',
       'artist_genres', 'artist_popularity', 'owner'],
      dtype='object')
In [34]:
def genre_comparison(df_lars: pd.DataFrame, df_marco: pd.DataFrame) -> None:
    """
    Compares the top 10 most listened genres between two DataFrames and creates a horizontal bar plot.

    Args:
        df_lars (pd.DataFrame): DataFrame containing Lars's data.
        df_marco (pd.DataFrame): DataFrame containing Marco's data.
    """
    # Flatten the genres attribute
    genres_lars = flatten_attr(df_lars)
    most_listened_genres_lars = pd.Series(genres_lars).value_counts().sort_values(ascending=True)[-10:]
    genres_marco = flatten_attr(df_marco)
    most_listened_genres_marco = pd.Series(genres_marco).value_counts().sort_values(ascending=True)[-10:]

    # Create subplots
    fig = make_subplots(rows=1, cols=2, subplot_titles=("Lars", "Marco"), horizontal_spacing=0)

    # Add horizontal bar plot for Lars
    fig.add_trace(go.Bar(
        y=most_listened_genres_lars.index,
        x=most_listened_genres_lars.values / len(genres_lars) * 100,
        name='Lars',
        orientation='h',
        text=most_listened_genres_lars.index,
        textposition='inside',
        textfont=dict(
            size=15
    )), row=1, col=1)

    # Add horizontal bar plot for Marco
    fig.add_trace(go.Bar(
        y=most_listened_genres_marco.index,
        x=most_listened_genres_marco.values / len(genres_marco) * 100,
        name='Marco',        
        orientation='h',
        text=most_listened_genres_marco.index,
        textposition='inside',
        textfont=dict(
            size=15
        )

    ), row=1, col=2)

    # Update layout
    fig.update_layout(
    title={
        'text': ' Top 10 Most Listened Genres',
        'font': {
            'size': 24  # Change this value to the desired text size
        }
    },
        height=800,
        showlegend=False,
        yaxis=dict(showticklabels=False),
        yaxis2=dict(showticklabels=False),
    font=dict(
        size=15
    )
    )

    fig.update_xaxes(range=[14, 0], row=1, col=1)
    fig.update_xaxes(range=[0, 14], row=1, col=2)

    # xlabel with annotation
    fig.add_annotation(
        x=0.5,
        y=-0.1,
        xref='paper',
        yref='paper',
        text='Number of Songs per Genre (%)',
        showarrow=False,
        font=dict(
            size=16
        )
    )
    # Show plot
    fig.show()

genre_comparison(df_lars=df_lars, df_marco=df_marco)
Skipping row 575 with value nan: malformed node or string: nan
Skipping row 576 with value nan: malformed node or string: nan
Skipping row 577 with value nan: malformed node or string: nan

Interpretation

It is quickly apparent that both playlists contain mainly metal music. The genres are essentially subgenres of metal. Both playlists most frequently contain alternative metal as a genre. Many genres are included in both top 10 playlists but elsewhere. Like metalcore, for example.

Figure 2 - Top 10 Most Listened Artists

In [35]:
def artist_comparison(df_lars: pd.DataFrame, df_marco: pd.DataFrame) -> None:
    """
    Compares the top 10 most listened artists between two DataFrames and creates a horizontal bar plot.

    Args:
        df_lars (pd.DataFrame): DataFrame containing Lars's data.
        df_marco (pd.DataFrame): DataFrame containing Marco's data.
    """
    # Flatten the 'names' attribute
    artists_lars = flatten_attr(df_lars, 'artist_names')
    most_listened_songs_lars = pd.Series(artists_lars).value_counts().sort_values(ascending=True)[-10:]
    artists_marco = flatten_attr(df_marco, 'artist_names')
    most_listened_songs_marco = pd.Series(artists_marco).value_counts().sort_values(ascending=True)[-10:]

    # Create subplots
    fig = make_subplots(rows=1, cols=2, subplot_titles=("Lars", "Marco"), horizontal_spacing=0)

    # Add horizontal bar plot for Lars
    fig.add_trace(go.Bar(
        y=most_listened_songs_lars.index,
        x=most_listened_songs_lars.values / df_lars.shape[0] * 100,
        name='Lars',
        orientation='h',
        text=most_listened_songs_lars.index,
        textposition='inside',
        textfont=dict(
            size=15
        )
    ), row=1, col=1)

    # Add horizontal bar plot for Marco
    fig.add_trace(go.Bar(
        y=most_listened_songs_marco.index,
        x=most_listened_songs_marco.values / df_marco.shape[0] * 100,
        name='Marco',
        orientation='h',
        text=most_listened_songs_marco.index,
        textposition='inside',
        textfont=dict(
            size=15
        )
    ), row=1, col=2)

    # Update layout
    fig.update_layout(
        title={
            'text': 'Top 10 Most Listened Artists',
            'font': {
                'size': 24  # Adjust the text size as needed
            }
        },
        height=800,
        showlegend=False,
        yaxis=dict(showticklabels=False),
        yaxis2=dict(showticklabels=False),
        font=dict(
            size=15
        )
    )

    # Update x-axis range for both subplots
    fig.update_xaxes(range=[10, 0], row=1, col=1)
    fig.update_xaxes(range=[0, 10], row=1, col=2)

    # Add annotation for x-axis label
    fig.add_annotation(
        x=0.5,
        y=-0.1,
        xref='paper',
        yref='paper',
        text='Number of Songs per Artist (%)',
        showarrow=False,
        font=dict(
            size=16
        )
    )

    fig.add_annotation(x = 8.5, y = 'Trivium', xref='paper', yref='paper', text='<span style="font-size:24px;">&#8592;</span> Trivium: 24.3%', showarrow=False, font=dict(size=15, color='white'), col=1, row=1)
    # Show the plot
    fig.show()

# Call the function to display the comparison
artist_comparison(df_lars, df_marco)
Skipping row 575 with value nan: malformed node or string: nan
Skipping row 576 with value nan: malformed node or string: nan
Skipping row 577 with value nan: malformed node or string: nan

Interpretation:

It's easy to see that Lars is a big fan of the band Trivium, which makes up about 24% of his playlist. The other songs on Lars' playlist are also more artist-focused. So Lars' playlist contains more songs by the same artists than Marco's playlist. Marco's favorite band is the german band 'Electric Callboy'.

Figure 3 - Shared Genres between Lars's and Marco's Playlist

In [36]:
def common_genres_songs_venn(df_lars: pd.DataFrame, df_marco: pd.DataFrame) -> None:
    """
    Generates a Venn diagram and a table to visualize common genres between two playlists.

    This function calculates the common genres between two DataFrames representing different
    playlists, and visualizes the result using a Venn diagram and a table of the top common genres.

    Args:
        df_lars (pd.DataFrame): DataFrame containing Lars's playlist data.
        df_marco (pd.DataFrame): DataFrame containing Marco's playlist data.
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 8), gridspec_kw={'width_ratios': [2, 1]})
    plt.subplots_adjust(wspace=0.3, hspace=0)

    # Calculate common genres and their counts
    marco_genres = flatten_attr(df_marco)
    lars_genres = flatten_attr(df_lars)
    common_genres = set(marco_genres).intersection(set(lars_genres))

    # Number of genres which are not shared
    nr_venn_marco = len(set(marco_genres)) - len(set(common_genres))
    nr_venn_lars = len(set(lars_genres)) - len(set(common_genres))

    

    marco_genre_counts = Counter(marco_genres)
    lars_genre_counts = Counter(lars_genres)
    common_genre_counts = pd.Series({genre: min(marco_genre_counts[genre], lars_genre_counts[genre]) for genre in common_genres}).sort_values(ascending=False).head(10)


    # Plot Venn diagram on the first axis
    venn = venn2(subsets=(nr_venn_lars, nr_venn_marco, len(common_genres)), set_labels=('Lars', 'Marco'),
                 set_colors=('mediumaquamarine', 'slategrey'), ax=ax1)
    venn2_circles(subsets=(nr_venn_lars, nr_venn_marco, len(common_genres)), linestyle='dotted', linewidth=1, color='k', alpha=0.5, ax=ax1)
    ax1.set_title("Common Genres in Lars's and Marco's Playlists", fontsize=16, fontweight='bold', loc='left')

    # Plot the table on the second axis with overlap measure included
    ax2.axis('off')  # Turn off the axis
    table_data = [[genre] for genre in common_genre_counts.index]
    table_data.append(['...'])
    table = ax2.table(cellText=table_data, colLabels=['Genre'], cellLoc='left', loc='center', colColours=['wheat'], bbox=[-0.4, 0.2, 0.7, 0.6])

    # Adjust font and layout
    table.auto_set_font_size(False)
    table.set_fontsize(12)
    table.auto_set_column_width([0, 1, 2])
    table.scale(1, 1.5)

    # Make an arrow
    ax1.annotate(
        '', xy=(0.39, 0.5), xycoords='axes fraction',  # Arrow start point (adjust as necessary)
        xytext=(1.16, 0.8), textcoords='axes fraction',  # Arrow end point (adjust as necessary)
        arrowprops=dict(facecolor='black', shrink=0.05, width=1.5, headwidth=10, headlength=10, connectionstyle='arc3,rad=0.25')
    )

    # Adjust layout and font
    plt.rcParams.update({'font.size': 12, 'font.family': 'monospace'})
    plt.savefig('common_genres_venn.pdf', bbox_inches='tight')
    plt.show()

common_genres_songs_venn(df_lars, df_marco)
Skipping row 575 with value nan: malformed node or string: nan
Skipping row 576 with value nan: malformed node or string: nan
Skipping row 577 with value nan: malformed node or string: nan
No description has been provided for this image

Interpretation:

Lars listens to a total of 86 different genres in this playlist and Marco listens to 226. 59 of these genres overlap. Lars playlist contains many genres which are shared with Marcos playlist. But Marco's playlist contains more then twice as many genres as Lars's playlist. Many genres from MArco's playlist are not shared with Lars's playlist.

Figure 4 - Shared Songs between Lars's and Marco's Playlist

In [37]:
def common_genres_songs_venn(df_lars: pd.DataFrame, df_marco: pd.DataFrame) -> None:
    """
    Generates a Venn diagram and a table to visualize common songs between two playlists.

    This function finds common songs between two DataFrames representing different playlists,
    and visualizes the result using a Venn diagram and a table of the common songs with their artists.

    Args:
        df_lars (pd.DataFrame): DataFrame containing Lars's playlist data.
        df_marco (pd.DataFrame): DataFrame containing Marco's playlist data.
    """
    # Find common songs
    common_songs = df_lars[df_lars['track_name'].isin(df_marco['track_name'])]
    nr_common_songs = len(common_songs)
    nr_lars_songs = len(df_lars) - nr_common_songs
    nr_marco_songs = len(df_marco) - nr_common_songs

    # Prepare data for the table
    common_songs_list = common_songs[['artist_names', 'track_name']]
    common_songs_list.columns = ['Artist', 'Song']
    
    # Extract artists from names column
    artists = [artists[1:-1].split(",")[0][1:-1] for artists in common_songs['artist_names']]

    # Create a figure with two axes
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 7),gridspec_kw={'width_ratios': [2, 1]})
    plt.subplots_adjust(wspace=0.3, hspace=0)

    # Plot the Venn diagram on the first axis
    venn = venn2(subsets=(nr_lars_songs, nr_marco_songs, nr_common_songs), set_labels=('Lars', 'Marco'), set_colors=('mediumaquamarine', 'slategrey'), ax=ax1)
    venn2_circles(subsets=(nr_lars_songs, nr_marco_songs, nr_common_songs), linestyle='dotted', linewidth=1, color='k', alpha=0.5, ax=ax1)
    ax1.set_title("Common Songs in Lars's and Marco's Playlists", fontsize=16, fontweight='bold', loc = 'left')

    # Plot the table on the second axis
    ax2.axis('off')  # Turn off the axis
    table_data = [[artist, song] for artist, song in zip(artists, common_songs_list['Song'])]
    table = ax2.table(cellText=table_data, colLabels=['Artist', 'Song'], cellLoc='left', loc='center', colColours=['wheat', 'wheat'], bbox=[-0.4, 0.105, 1.6, 0.75])

    # Adjust font and layout
    table.auto_set_font_size(False)
    table.set_fontsize(11)
    table.auto_set_column_width([0, 1])
    table.scale(1, 1.5)

    # make an arrow
    ax1.annotate(
        '', xy=(0.39, 0.5), xycoords='axes fraction',  # Arrow start point (adjust as necessary)
        xytext=(1.065, 0.92), textcoords='axes fraction',  # Arrow end point (adjust as necessary)
        arrowprops=dict(facecolor='black', shrink=0.05, width=1.5, headwidth=10, headlength=10, connectionstyle="arc3,rad=0.2")
    )
    plt.savefig('common_songs_venn.pdf', bbox_inches='tight')
    plt.show()

# Assuming the dataframes are defined and loaded
common_genres_songs_venn(df_lars=df_lars, df_marco=df_marco)
No description has been provided for this image

Interpreation:

Lars playlist contains 238 songs and MArco's playlist contains 578 songs. Of these, 18 songs are included in both playlists. Often bands are not only included 1 time. So the common songs are often also common artists.

Figure 5 - Playlist Charachetrics

In [38]:
def box_plots(df: pd.DataFrame) -> None:
    """
    Generates box plots for different playlist characteristics (tempo, duration, and positivity) for two playlist owners.

    Args:
        df (pd.DataFrame): DataFrame containing playlist data with columns 'owner', 'tempo', 'duration_min', and 'valence'.
    """
    # Create subplots
    fig = make_subplots(rows=1, cols=3, subplot_titles=('Tempo of Tracks', 'Duration of Tracks', 'Conveyed Positivity of Tracks'))

    # Add the first box plot for 'tempo'
    fig.add_trace(
        go.Box(x=df['owner'], y=df['tempo'], name='Tempo', marker_color='blue'),
        row=1, col=1
    )

    # Add the second box plot for 'duration_ms'
    fig.add_trace(
        go.Box(x=df['owner'], y=df['duration_min'], name='Duration', marker_color='orange'),
        row=1, col=2
    )

    # Add the third box plot for 'valence'
    fig.add_trace(
        go.Box(x=df['owner'], y=df['valence'], name='Positivity', marker_color='green'),
        row=1, col=3
    )

    # Update layout
    fig.update_layout(
        title_text='Playlist Characteristics',
        showlegend=False
    )
    fig.update

    fig.update_layout(height=600, title_x=0.5, title_font_size=24, font_size=15)

    # now i want to put the y axis labels
    fig.update_yaxes(title_text="Tempo in BPM", row=1, col=1)
    fig.update_yaxes(title_text="Duration in Minutes", row=1, col=2)
    fig.update_yaxes(title_text="Positivity", row=1, col=3)

    # now i want to add text  to empasize certain aspect in the plot
    fig.add_annotation(
        x=0.5,
        y=-0.1,
        xref='paper',
        yref='paper',
        text='Owner of the Playlist',
        showarrow=False,
        font=dict(
            size=16
        )
    )

    # Show the plot
    fig.show()
box_plots(df)

Notes:

  • The positivity measure ranges from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

  • This is only a small part of the available features.

Interpreation:

The BPM of the 2 playlist are distributed similar. Lars playlist tends to contain longer songs. Marcos median for the positivity measure is larger then Lars's playlist. This means that Marco's playlist tends to be more positive.

Figure 6 - Added Songs per Month

In [39]:
def playlist_behavior(df_lars: pd.DataFrame, df_marco: pd.DataFrame) -> None:
    """
    Creates histograms showing the number of songs added per month for Lars and Marco, including annotations for significant events.

    Args:
        df_lars (pd.DataFrame): DataFrame containing Lars's playlist data with a column 'added_at' for the date songs were added.
        df_marco (pd.DataFrame): DataFrame containing Marco's playlist data with a column 'added_at' for the date songs were added.
    """
    # i want to create a histogram of the number of songs added per month

    # Create subplots
    fig = make_subplots(rows=1, cols=2, subplot_titles=('Added Songs per Month Lars', 'Added Songs per Month Marco'))

    # Add histogram for Lars
    fig.add_trace(
        go.Histogram(x=df_lars['added_at'], name='Lars', marker_color='blue', xbins=dict(size = 'M1')),
        row=1, col=1
    )

    # Add histogram for Marco
    fig.add_trace(
        go.Histogram(x=df_marco['added_at'], name='Marco', marker_color='orange', xbins=dict(size = 'M1')),
        row=1, col=2
    )

    # Update layout
    fig.update_layout(
        title_text='Number of Added Songs per Day',
        showlegend=False,
        xaxis_title='Date',
        yaxis_title='Number of Songs',
        bargap=0.2,
        height=600,
    )

   # Convert date string to datetime object
    beg_first_lockdown = pd.to_datetime('2020-12-16')
    # Decision of 
    end_first_lockdown = pd.to_datetime('2021-05-01')




    # Add vertical line for the beginning of the first lockdown
    fig.add_shape(
        dict(
            type='rect',
            x0=beg_first_lockdown,
            y0=0,
            x1=end_first_lockdown,
            y1=50,
            line=dict(
                color='red',
                width=2
            ),
            fillcolor='red',
            opacity=0.2,
            layer='above',
        )
    )
    # Add now a text to the shape which says Lockdown 2
    fig.add_annotation(
        x='2021-05-5',
        y=40,
        xref='x',
        yref='y',
        text='Lockdown 2',
        showarrow=True,
        arrowhead=1,
        ax=80,
        ay=-40,
        font=dict(
            size=16,
            color='black'
        )
    )

    fig.add_annotation(
        x='2023-07-01',
        y=400,
        xref='x',
        yref='y',
        text='New Playlist from Existing Playlist',
        showarrow=True,
        arrowhead=1,
        ax=150,
        ay=-40,
        font=dict(
            size=16,
            color='black'
        ), col=2, row=1
    )
    # rectangle for semetserstart
    semester_start = pd.to_datetime('2024-03-15')
    semester_end = pd.to_datetime('2024-07-27')

    fig.add_shape(
        dict(
            type='rect',
            x0=semester_start,
            y0=0,
            x1=semester_end,
            y1=600,
            fillcolor='red',
            opacity=0.2,
            layer='above',
            line=dict(
                color='red',
                width=2
            ),
        ), col = 2, row = 1
    )
    # add an anotation for the semester 
    fig.add_annotation(
        x='2024-03-12',
        y=200,
        xref='x',
        yref='y',
        text='University Semester',
        showarrow=True,
        arrowhead=1,
        ax=-100,
        ay=-40,
        font=dict(
            size=16,
            color='black'
        ), col=2, row=1
    )

    # y limit for second plot
    fig.update_yaxes(range=[0, 550], row=1, col=2)
    # Show the plot
    fig.show()

playlist_behavior(df_lars=df_lars, df_marco=df_marco)

Notes:

  • The exact dates of the lockdown are not easy to determine. Already from november 2. a 'Lockdown Light' with initiated in germany. But the hard lockdown started at december 13. and continoued till may.

Interpretation:

Lars and Marco have added songs very differently to their playlist. Marco added most of the songs to the playlist in the first month. After consulting with the domain expert Marco, I was informed that he had created them from an old playlist. Lars added the songs to the playlist much more consistently. It is noticeable that a large part of the songs were added during the 2nd lockdown. Zooming into May 2024 it is noticable that Marco started adding a big number of songs. He never added that many songs in a month since the creation of the playlist. This could be because of the start of the Semster at the FH in Kiel.

Summary

Lars's and Marco's playlists both contain metal as a music genre to a very large extent. Some songs are also included in both playlists. All these songs are exclusively metal songs. We also found out that Lars' playlist tends to contain longer songs but also songs that contain musical negativity such as aggression and sadness. Lars and Marco maintained the playlist in very different way. Marco addest most songs at the day of creation whereas added songs continously with a peak in the second lockdown in germany. The aim of this project was also to investigate to what extent it is possible to develop an application that compares 2 spotify playlists and provides rich information. This was successful.

Limitations:

Part of this idea was also to develop an automatic tool that automatically compares playlists based on spotify. However, the analysis carried out contained annotations and texts that are customized to the two playlists. This is not possible with a web tool without human help.

Future Features:

  • Top genres and artists were compared in figure 1 and 2. The same artists and genres between the playlists could be highlighted with the same colors.

  • Automation should also be improved for a good application.

Code Assistance

  • Github Copilot: Version 1.206.0.0

  • ChatGPT: Version 3.5